The effect of model structure and data availability on Usutu virus dynamics at three biological scales

Understanding the epidemiology of emerging pathogens, such as Usutu virus (USUV) infections, requires systems investigation at each scale involved in the host–virus transmission cycle, from individual bird infections, to bird-to-vector transmissions, and to USUV incidence in bird and vector populations. For new pathogens field data are sparse, and predictions can be aided by the use of laboratory-type inoculation and transmission experiments combined with dynamical mathematical modelling. In this study, we investigated the dynamics of two strains of USUV by constructing mathematical models for the within-host scale, bird-to-vector transmission scale and vector-borne epidemiological scale. We used individual within-host infectious virus data and per cent mosquito infection data to predict USUV incidence in birds and mosquitoes. We addressed the dependence of predictions on model structure, data uncertainty and experimental design. We found that uncertainty in predictions at one scale change predicted results at another scale. We proposed in silico experiments that showed that sampling every 12 hours ensures practical identifiability of the within-host scale model. At the same time, we showed that practical identifiability of the transmission scale functions can only be improved under unrealistically high sampling regimes. Instead, we proposed optimal experimental designs and suggested the types of experiments that can ensure identifiability at the transmission scale and, hence, induce robustness in predictions at the epidemiological scale.


Introduction
Usutu virus (USUV) is an emerging zoonotic flavivirus similar to West Nile virus (WNV) [1,2] that circulates in sub-Saharan Africa, central Europe and the Mediterranean basin [3][4][5].It is maintained in the environment through an enzootic cycle involving mosquitoes and birds; and has been associated with decreased bird populations in Europe, occasional spillover to mammals, including humans [2,6], and a few cases of neurological complications in humans [2,7,8].Given the novelty of the USUV, it is of great importance to understand how individual-scale processes impact population-scale spread.Since the virological and epidemiological patterns are usually independent of each other, mathematical modelling can serve as a tool to link knowledge at these scales.To better understand the bird-mosquito USUV transmission cycle, we investigate the within-host viral dynamics inside an infected bird, the bird-to-mosquito transmission probability and the epidemiological-level bird and mosquito infection incidence.
In previous work, we used within-host mathematical models to study the viral-host interactions of different USUV strains following individual bird ( juvenile chicken) infections and found that the European USUV strains (Spain 2009 and Netherlands 2016) have higher replication rates and peak viremia compared with the African USUV strains (Uganda 2012 and South Africa 1959) [9,10].A recent empirical study has shown that birds (wild-caught house sparrows and juvenile chickens) inoculated with Netherlands 2016 have higher probability of infection to mosquito (Culex quinquefasciatus) than birds inoculated with Uganda 2012 [11].
In this study, we investigate the relationship between USUV strain characteristics and per cent mosquitoes infection, and use that inference to assist how model and data structure at these scales impact model predictions of USUV incidence in bird and mosquito populations.To achieve this, we develop an age-structured vector-borne disease model and parametrize it using Netherlands 2016 and Uganda 2012 longitudinal infectious viral titres in infected birds combined with bird-to-mosquito transmission data [11].We use the model to predict the mechanistic interactions that describe USUV incidence dynamics in bird and mosquito populations.The paper also aims to determine whether scarcity of data used in validating within-host models and bird-to-vector probability of infection functions results in biases in the predicted disease incidence.To address this, we investigate structural and practical identifiability of the within-host model and practical identifiability of the bird-to-vector probability of infection functions and propose solutions for improvement in predictions through addition of extra and/or alternative measurements.

Within-host scale
Within-host model.We consider a within-host model with eclipse phase used in other acute viral infections [9,10,12,13] to describe the within-host USUV dynamics.The model includes uninfected leukocytes T, exposed leukocytes E, productively infected leukocytes I and USUV V. We investigate their dynamics τ days after infection, as follows [9].Following viral infection, target cells become exposed at rate β and exposed cells become productively infected cells at rate k.Productively infected cells produce π virions per day and die at rate δ.Virus is cleared at rate c.Given the short timespan of USUV infection within an infected bird (around 7 days) no renewal or death terms are considered for the leucocyte populations.A diagram describing these interactions is given in figure 1 and the model is described by the following equations: with initial conditions T(0) = T 0 , E(0) = 0, I(0) = 0, V(0) = V 0 .Note that the effect of viral loss through cell entry, given by −βTV, is negligible compared with virus production and clearance terms and, therefore, royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231146 not included in the virus equation.We aim to estimate model equation (2.1)'s parameters by fitting it to published infectious virus titres in wild-caught house sparrows infected with either Netherlands 2016 or Uganda 2012 [11].Empirical data.Data consists of two cohorts (n = 14 each) of wild-caught house sparrows, confirmed to be seronegative for WNV, and subcutaneously inoculated with 1500 plaque forming units (PFU) of USUV Netherlands 2016 or Uganda 2012.Infectious viral titres (in PFU ml −1 ) V d t were collected daily for the first 7 days post inoculation τ = {1, 2, …, 7}.Of the n = 14 birds inoculated with Netherlands 2016 only n = 11 had virus measured above the limit of detection (LOD = 100 PFU ml −1 ) and only n = 10 had at least two viral titre measurements above the limit of detection.All n = 14 birds inoculated with Uganda 2012 had viremia above the limit of detection, but only n = 6 had at least two viral titres above the limit of detection [11].
Data fitting algorithm.We assume an initial target population Tð0Þ ¼ 4 Â 10 6 cells ml À1 [10] and no exposed or infected cells Eð0Þ ¼ Ið0Þ ¼ 0 cells ml À1 .The empirical infectious inoculum was 1500 PFU which, when distributed across 2.5 ml of blood [14], gives an initial inoculum concentration of 600 PFU ml −1 .We assumed an initial Vð0Þ ¼ 10 PFU ml À1 infectious titre, and rejected Vð0Þ ¼ 100 PFU ml À1 and Vð0Þ ¼ 600 PFU ml À1 initial conditions through model selection (see table S1 in the electronic supplementary material).All other parameters p = {β, δ, c, π, k} are unknown.We only use the birds that have at least two data points above limit of detection, n = 10 and n = 6 for Netherlands 2016 and Uganda 2012 infection, respectively.We exclude bird U-Sp8 from Uganda 2012 fits due to unrealistic data fitting predictions (see electronic supplementary material, figure S1).Hence, the number of birds inoculated with Uganda 2012 that are used in this study becomes n = 5.
We estimate the mean, median and standard deviation of population parameters p using a nonlinear mixed effects modelling approach that uses stochastic approximation estimation-maximization (SAEM) algorithm in Monolix [15].Briefly, we assume that population parameters p are lognormally distributed with mean ln(μ) and standard deviation σ.Moreover, we assume that the proportional error between the model and the data, ðV d t À VÞ=V, is normally distributed with mean zero and standard deviation η.Since several data measurements are below the limit of detection (LOD = 100 PFU ml −1 ), we consider these measurements to be censored data.They are incorporated into the data fitting in Monolix by assuming that each censored data point takes a value between zero and the LOD [16,17].Besides mean population parameters, we also estimate individual bird parameters p i (for bird i) using the Nelder-Mead (NM) simplex algorithm in Monolix software [15].Model equation (2.1)'s goodness of fit, given by the Akaike information criterion (AIC) index, is where LL is the log-likelihood and N is the number of estimated parameters in the nonlinear mixed effects model (in our case N = 11, 10 for the individual mean and standard deviations of the five fitted parameters and one for the population standard deviation).

Bird-to-mosquito transmission scale
Bird-to-mosquito probability of infection mathematical formulation.To determine the relationship between the USUV load inside a bird V(τ) and the probability of a transmission event to a mosquito at day τ, we develop three models describing the probability of bird-to-mosquito transmission as bTV Within-host model: royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: functions of viral load V(τ).We assume the number of infectious viruses in the sample that establish an infection in a mosquito is a random variable that follows a Poisson distribution with mean E[V ].If each transmissible virus has a probability ρ of establishing infection in a mosquito, then the number of viruses that are successfully transmitted follows a Poisson distribution with parameter λ = E[V ]ρ [18] and the probability of one or more viruses successfully infecting a mosquito is bðVÞ ¼ 1 À expðÀlÞ: ð2:2Þ We consider the following three models for the mean transmissible virus E(V ) [19], 1.The linear model assumes that the mean transmissible virus is proportional to the detectable viral load (above the limit of detection), i.e.
where D ¼ LOD ¼ 100 PFU ml À1 and ν is a positive constant.The corresponding probability of infection becomes where a = νρ.2. The power-law model assumes that the mean transmissible virus is proportional to the power of birds's detectable viral load, i.e.
where ν and h are positive constants.The corresponding probability of infection becomes where a = νρ.

3.
The density-dependent model assumes that the mean transmissible virus is proportional to a densitydependent function of the bird's detectable viral load, expressed by a Hill-type function, i.e.
where L is the above limit of detection bird's viral load where the growth is half-maximal and ν and h are positive constants.The corresponding probability of infection becomes where a = νρ.
Empirical data.Culex quinquefasciatus mosquitoes were fed on birds infected with USUV Netherlands 2016 and Uganda 2012, as follows.Mosquitoes were fed on seven birds: three of the wildcaught house sparrows used in the within-host models (N-Sp1, N-Sp9 and N-Sp3) and four juvenile chickens (bred for low antibody titres) τ = 2 days after they were inoculated with 1500 PFU Netherlands 2016 [11].The paired bird viral load data (on log 10 scale) at day 2, log 10 v j ¼ log 10 V j ð2Þ, and proportion of infected mosquitoes data b d j for bird inoculated with Netherlands 2016 is for U-Sp2, U-Sp3, U-Sp5 and four juvenile chickens, respectively.Data fitting algorithm.We fitted the probability of infection functions b 1 ðlog 10 vÞ, b 2 ðlog 10 vÞ and b 3 ðlog 10 vÞ to both T n and T u per cent mosquito infection data using a least-squares approach.The following objective functional: where c v is the contact rate, γ b is the bird removal rate (through recovery and/or death) and b i (V(τ)), i = {1, 2, 3} is the probability of vector infection following contact with an infected bird given by equations (2.3)-(2.5).Birds get infected through contact with a vector at constant rate β b and are removed (through recovery and/or death) at constant rate γ b .We use a mass-action term for the force of infection, which assumes that biting rates are limited by the mosquito and bird densities.We assume a mosquito birth rate L v and per capita death rate μ v .Similarly, we assume a bird birth rate L b and per capita death rate μ b .A model description is given in figure 2 and the model equations are We work with normalized bird and mosquito populations.Note, that while there is no recovery of infected mosquitoes, birds do recover from USUV infection.We aim to predict bird and mosquito populations' infection incidence over time if the probability of infection is described by different functions b i (V(τ)) given by equations (2.

Identifiability analysis
We have defined the USUV dynamics at three different scales: within-host, bird-to-mosquito transmission and between-host.We will validate the within-host and bird-to-mosquito transmission scales with experimental data for both Netherlands 2016 and Uganda 2012 strains.Before estimating parameters of a mathematical model from data, we first need to study whether the model is structured to reveal its parameters from the data.This process is called the identifiability analysis.There are two types of identifiability: structural identifiability and practical identifiability.Structural identifiability describes the theoretical base for determining model parameters uniquely given the model structure and unlimited noise-free experimental observations.By contrast, practical identifiability determines whether model parameters are identifiable given limited experimental observations which are contaminated with noise.
Structural identifiability.Consider a model of form x 0 ðtÞ ¼ f ðxðtÞ, pÞ, ð2:11Þ where τ denotes age of infection, x(τ) represents the vector of state variables expected to match the empirical observations of viral loads, y(τ).The goal of the structural identifiability is to determine whether model x 0 (τ) = f (x(τ), p) can uniquely reveal its parameters p given unlimited empirical observations y(τ) and no measurement errors.Several techniques have been proposed for analysing the structural identifiability of mathematical models, including those found in [22][23][24].In this study, we use the differential algebra method which allows us to eliminate unobserved state variables and derive differential-algebraic polynomials of the observed variable and the model parameters, referred to as the input-output equations [24][25][26].
Between-host model:  royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231146 Definition 2.1.Let c( p) denote the coefficients of the input-output equations where p is model equation (2.11)'s parameter vector.We say that model equation (2.11) is structured to reveal its parameters from the observation y(τ) if and only if The structural identifiability analysis of ODE models using the differential algebra approach involves the following steps.First, model equations are transformed into a monic differential polynomial called input-output equations by eliminating unobserved state variables.The input-output equations establish the relationship between the model parameters and the observations.Next, coefficients of input-output equations are determined and the mapping from the parameter space to these coefficients is examined.If the map is one-to-one, then all parameters are identifiable, indicating that the model is structurally identifiable.If the map is not one-to-one, then the model lacks structurally identifiability.In such cases, parameter correlations are used to fix parameters, resulting in a structurally identifiable model with improved parameter estimation accuracy and reliability.
Practical identifiability.Structural identifiability investigates whether parameters can be estimated from a model given unlimited noise-free observations.Since, in practice, data collected is discrete and may contain significant measurement errors, structural identifiability of a model is not enough to guarantee existence of practically identifiable parameters.Therefore, it becomes necessary to determine whether structurally identifiable parameters can be estimated from noisy data.There are several methods for studying the practical identifiability of ODE models [23,24,[26][27][28][29][30][31].Here, we use the Monte Carlo simulation (MCS) approach [23,24,26,30] which follows the following steps: 1. We solve model equation (2.11) A model is said to be practically identifiable when parameters p (k) are practically identifiable, for all k.

Within-host scale
To investigate the temporal dynamics of USUV inside infected birds we developed a within-host model equation (2.1) and fitted it to experimental data from two bird cohorts (see Material and methods).where τ denotes time since infection, x = {T, E, I, V} represents the vector of state variables, and p = {β, δ, c, π, k} is the vector of parameters.The model output y(τ) represent the empirical observations, which, in our case, are the viral infectious titres, y(τ) = V(τ).We compute the input-output equations using the Differential Algebra for Identifiability of System (DAISY) software [32] for equation (2.1) and obtain According to definition 2.1 (see Material and methods), we need to show that the function mapping the parameter space to the coefficients of the input-output equation (3.1) is one-to-one.Assume that there exists another parameter vector, p ¼ f b, k, d, p, ĉg, which has produced the same observation, y(τ) = V(τ).By setting cðpÞ ¼ cðpÞ, we obtain Solving this nonlinear system of equations in Mathematica, we obtain the following six sets of solutions ð3:2Þ This reveals that only β is globally identifiable, whereas c, k and δ are locally identifiable.Lastly, parameter π cannot be identified as it does not appear in the input-output equation (3.1).We summarize the structural identifiability of model equation (2.1) (when we do not take into account initial conditions) in the following proposition.
Proposition 3.1.The within-host model equation (2.1) is not structured to reveal its parameters from the viral load V(τ) observations.More precisely, the infectivity rate β is globally identifiable.The eclipse rate k, killing rate δ and clearance rate c are locally identifiable.The viral production rate π, is not identifiable from unlimited, noise-free viral load observations V(τ).
To acquire a better understanding of why parameter π is not identifiable, we scale the unobserved state variables by a positive constant e .0 while keeping the observed variable unchanged, i.e. f T, Ê, Î, Vg ¼ feT, eE, eI, Vg.The scaled model becomes which is almost identical to equation (2.1) except that the parameter π is scaled by 1=e and initial value T(0) is scaled by e.Without initial conditions, it is not possible to determine the scaling factor e, and hence parameter π.When, however, all initial conditions are known, all parameters of model equation (2.1) become globally identifiable.We summarize these results, as follows.
Proposition 3.2.The within-host model equation (2.1) is structured to identify all its parameters from unlimited, noise-free viral load observations V(τ) if all initial conditions are known.

Within-host model dynamics
As seen above, under known initial conditions and unlimited noise-free measurements, all parameters of model equation (2.1) can be identified from data.We next fitted model equation (2.1) to empirical infectious virus titre data from two bird cohorts inoculated with Netherlands 2016 and Uganda 2012 using a nonlinear mixed effects approach (see Material and methods).A summary of mean and standard deviations for estimated population parameters together with median parameter estimates and AIC values is given in table 1.A summary of individual estimated parameters for birds inoculated with Netherlands 2016 is given in electronic supplementary material, table S2 and for those inoculated with Uganda 2012 is given in electronic supplementary material, table S3.The model dynamics are plotted in figure 3 and the lognormal parameter distributions in figure 4. Model fitting results in similar population parameter estimates for infected cells death rates, d ¼ 6:95 day À1 and d ¼ 6:74 day À1 (corresponding to lifespan of 3.5 hours) for Netherlands 2016 and Uganda 2012 infections, respectively.The average population estimates for the infectivity and production rates, β and π, are 3 and 2.4 times higher for Uganda 2012 compared with Netherlands 2016.By contrast, the average viral clearance rate, c, is 1.3 times lower for Uganda 2012 compared with Netherlands 2016.This results in faster expansion and slower clearance for Uganda 2012.Lastly, the average population estimates for the eclipse phase, 1/k, is 3.3 and 7.1 hours for Netherlands 2016 and Uganda 2012, respectively.These estimates result in average population basic reproduction number, R 0 = βp T(0)/cδ, of 4.1 (range of 2.2-7) and 43 (range of 2.2-56) for Netherlands 2016 and Uganda 2012, respectively.Interestingly, we find higher R 0 and π predictions for Uganda 2012 compared with Netherlands 2016 in wild-caught house sparrows, which is different from higher R 0 values for Netherlands 2016 compared with Uganda 2012 infections in juvenile chickens [9].However, as in the juvenile chicken studies, we observe delayed clearance of Uganda 2012 compared with clearance of Netherlands 2016 in wild-caught house sparrows [9].In both cohorts, we observe variability in individual viral profiles (electronic supplementary material, figures S2 and S3), and consequently differences in parameter estimates (electronic supplementary material, tables S2 and S3).This heterogeneity is larger among the birds inoculated with Uganda 2012, for which the number of birds was small (n = 5 compared with n = 10 for Netherlands 2016).For Netherlands 2016 infections, we predict that virus peaks 2-3 days post infection ( p.i.), while for Uganda 2012 infections it peaks 0.5-6 days p.i. Virus decays below one virion ml −1 4.5-7 days p.i. and 4.3-20 days p.i. for Netherlands 2016 and Uganda 2012, respectively.The mean peak viremia is similar between the two strains and equal to 1.5 × 10 6 PFU ml −1 , and the intra cohort estimates range between 545 and 8.2 × 10 6 PFU ml −1 for the Netherlands 2016 and between 2.3 × 10 3 and 6.2 × 10 6 PFU ml −1 for Uganda 2012.As expected, the day of the virus peak is negatively correlated to the estimated R 0 for both Netherlands 2016 and Uganda 2012 infections, with Pearson's correlation coefficients −0.83 and −0.84, respectively.The magnitude of the virus peak is positively correlated to the estimated R 0 in Netherlands 2016, with a Pearson's correlation coefficient of 0.97.This positive correlation is not observed in Uganda 2012 (see electronic supplementary material, figure S4).
We observed several sources that explain variability in parameter estimates (and unexpected predictions of higher R 0 for Uganda 2012 infection): bimodal distributions for parameter β estimates (figure 4) and correlations between β and π (see electronic supplementary material, figures S5 and S6).2B).Interestingly, parameters β and π are not identifiable for Uganda 2012 despite the fact that they are both structurally identifiable (under known initial conditions).We hypothesized that the nonidentifiability is due to limited number of birds that have detectable infectious viral titre data [33] (with a single subject having more than two data points above limit of detection).To investigate the dependence of practical identifiability on the number of data points, we repeat the MCS for frequent sample data.We create a synthetic dataset where we assume that two data points were collected each day (every 12 hours, for a total of 14 data points) and compute the new ARE values for these sets.Practical identifiability of β and π has improved significantly, with the average relative error ARE decreasing for both parameters (table 3) and the within-host model equation (2.1) becoming strongly practically identifiable for both Netherlands 2016 and Uganda 2012 (table 3A,B).Note, that this is the minimum number of extra measurements that guarantees practical identifiability of model equation (2.1).royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231146

Bird-to-mosquito transmission scale
Transmission of USUV from an infected bird to a mosquito is dependent on both the infectiousness of the host over age of infection and the probability of contact between a host and a vector.To best describe these interactions we developed three models of probability of host-to-vector transmission equations (2.3)-(2.5)and fitted them to experimental data (see Material and methods).

Dynamics of the bird-to-vector probability of infection functions
Based on AIC values, we found that the linear probability of infection function b 1 (V(τ)) best fits the per cent mosquito infection data for Netherlands 2016 and the power-law probability of infection function b 2 (V(τ)) best fits the per cent mosquito infection data for Uganda 2012 (figure 5, table 4, electronic supplementary material, figure S7 and table S4).For the resulting b 1 (V(τ)) function for the Netherlands 2016 strain and b 2 (V(τ)) function for the Uganda 2012 strain, we computed the relative error, η i , between the predicted probability of infection functions b i ðlog 10 VÞ and the per cent infected mosquitoes data b d for i = {1, 2, 3}.We found that the relative error is h 1 ¼ 46% for b 1 ðlog 10 VÞ and Netherlands 2016 and h 2 ¼ 34% for b 2 ðlog 10 VÞ and Uganda 2012.
We next performed parametric bootstrapping [34][35][36], where we generated 1000 synthetic datasets with the same structure (number of points and identical sampling time) as the original dataset and η i standard deviation from it.We fitted equations (2.3) and (2.4) to the synthetic data, and the resulting curves (green graphs in figure 5) show an error region for b 1 (V(τ)) and b 2 (V(τ)), respectively.

Practical identifiability of probability of infection functions
To address whether the probability of infection function b 1 (V(τ)) for Netherlands 2016 and b 2 (V(τ)) for Uganda 2012 are practically identifiable, we applied the MCS algorithm (see Material and methods) to parameters of models equations (2.3) and (2.4) for virtual datasets obtained by adding noise levels s ¼ f1%, 5%, 20%g to the seven ðlog 10 v, b d Þ data values for the two viral strains [11].We found that for function b 1 (V(τ)) and Netherlands 2016 data, parameter a is weakly practically identifiable  (table 5A).For function b 2 (V(τ)) and Uganda 2012 data, a is unidentifiable and h is weakly identifiable (table 5B).As with the within-host model, we expect to improve ARE values by increasing data frequency.Indeed, increasing the number of ðlog 10 v, b d Þ data points from 7 to 60 results in strong practical identifiability of a for function b 1 (V(τ)) and Netherlands 2016 data (table 6A).Moreover, it results in weak practical identifiability of a and strong practical identifiability of h for model b 2 (V(τ)) for Uganda 2012 data (table 6B).

Between-host scale
Using an age-structured modelling approach which couples within-host and between-host models, one of our goals was to determine how individual bird USUV infections, combined with bird-to-mosquito probability of infection, influence predictions of USUV incidence in bird and mosquito populations.

Dynamics of the between-host model
We computed the USUV incidence over time in the bird and mosquito populations using a vector-borne SIR model for birds and SI model for mosquitoes (equation (2.10); see Material and methods).To determine differences in disease incidence between the two viral strains, we used transmission functions β v (τ) defined by probability of infection functions that best matched the data of each strain, i.e. b 1 (V(τ)) given by equation (2.3) for Netherlands 2016, and b 2 (V(τ)) given by equation (2.4) for Uganda 2012.We observed similar timing in peak USUV incidence in bird populations, occurring on day 32 for Netherlands 2016 and on day 30 for Uganda 2012 (see figure 6a, solid versus dashed red lines).Peak USUV incidences in mosquito populations occur 26 and 30 days later than peak incidence in bird populations, for Netherlands 2016 and Uganda 2012 strains, respectively (figure 6b).Netherlands 2016 infects more mosquitoes than Uganda 2012, 17% compared with 13% (see figure 6b, solid versus dashed red lines).Both strains, however, infect similar per cent of birds, 7:5% for Netherlands 2016 and 6% for Uganda 2012 (see figure 6a, solid versus dashed red lines).

Optimal experimental design
The USUV incidence model equation (2.10) assumed that the probability of bird-to-mosquito infection functions have fundamentally different characteristics for the two virus strains considered.More specifically, it assumed that the per cent mosquito infection increases linearly with the log 10 infectious viral load for Netherlands 2016, and super-linearly with the log 10 infectious viral load (power coefficient h = 3.67) for Uganda 2012.Since we do not expect that the shape of the bird-to-mosquito infection functions vary by strain, we investigated how the strain-specific disease incidence predictions change when the other two functions (that did not result in a best fit) were considered: b 2 (V(τ)), b 3 (V(τ)) for Netherlands 2016 and b 1 (V(τ)), b 3 (V(τ)) for Uganda 2012.We obtained large variability between predicted USUV incidence in both bird and mosquito populations for both strains (see electronic supplementary material, figure S8).We investigate whether an improved initial empirical dataset will create more robust parameter values in the probability of infection functions, and royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231146 potentially unify results at the between-host scale.For example, it would be useful to know whether there is a viral load threshold where all mosquitoes get infected, or (alternatively) there is a viral load threshold where mosquito infection levels off.We performed two in silico optimal experimental designs: hypothesis 1 assumed that the viral load where all mosquitoes get infected is known, and hypothesis 2 assumed that per cent mosquito infection levels off at 60% (see electronic supplementary material).We fitted the probability of infection functions equations (2.3)-(2.5) to these datasets (see electronic supplementary material, figures S9 and S10) and found that the power-law function b 2 (V(τ)) becomes the best fit (lowest AIC) for both USUV strains in both scenarios (see electronic supplementary material, tables S7 and S8).This suggests that optimal experimental design together with frequent data measurements is needed to improve predictions.

Discussion
Understanding the epidemiology of emerging pathogens, such as USUV, requires systems investigation at each scale involved in the host-virus transmission cycle, from individual infection in birds, to bird-tovector transmission, to USUV incidence in bird and vector populations and eventually to spillover probability in humans [37][38][39].For new pathogens, field data is sparse, and predictions are based on laboratory-type inoculation and transmission experiments [9,11] combined with dynamical mathematical modelling [9].
In this study, we developed an age-structured within-between-host modelling approach for investigating the differences in the dynamics of two USUV strains, Netherlands 2016 of European descent and Uganda 2012 of African descent, in bird and mosquito populations.Our study investigated the USUV dynamics at three scales: the within-host scale, where the dynamics of individual bird infection were considered; bird-to-mosquito transmission scale, where the per cent of mosquito infections based on virus level was considered; and between-host scale, where a vectorborne age-structured epidemiological model of bird and mosquito interactions was considered.The models developed for the first two scales were validated against laboratory-produced infectious viral titre data from individual bird inoculation studies [11] and against per cent mosquito infection data [11].The last scale had no empirical data for validation.Instead, we used results from the first two scales to make predictions for the dynamics of USUV incidence in bird and mosquito populations.
To determine differences in USUV strain dynamics in individual bird infections (within-host scale), we validated a target cell limited within-host model with eclipse phase equation (2.1) against longitudinal infectious USUV data in wild-caught house sparrows inoculated with either Netherlands 2016 or Uganda 2012 [11].Before performing data fitting, we conducted structural identifiability analyses which showed that, if the number of target cells is not known, it is not possible to estimate the rate at which infected cells produce new viral particles each day.When, however, the initial target cell concentration and initial viral load are known, all within-host model parameters can be uniquely determined from unlimited, noise-free viral load observations [24,26,33].Consequently, we estimated the model parameters using a nonlinear mixed effects approach, and observed heterogeneity in outcomes at both within-and between-strain levels (figure 3).Moreover, we observed correlations between infection rate β and viral shedding rate π, especially for Uganda 2012, for which data were sparse and the number of subjects was low (figure 4, electronic supplementary material, figures S5 and S6).These correlations occurred despite our findings that all parameters are globally structurally identifiable (under unlimited data and known initial conditions).We attributed these results to lack of practical identifiability, given the limited experimental data available.Using practical identifiability analyses, we determined that a minimum of 14 infectious virus titre measurements collected every 12 hours (double the initial amount) are needed to achieve practical identifiability of all within-host model parameters (table 3).This experimental design can be achieved by either sampling each bird twice a day or by creating two cohorts with daily morning or daily evening samples and then fitting models to the entire population data.We anticipate that this key result will serve as a guideline for the experimental designing of other within-host inoculation studies.
To determine differences in bird-to-mosquito USUV transmission for the two viral strains, we developed three probability of infection functions that establish a relationship between the infectious virus titre inside a bird and the per cent of mosquito infection upon contact: the linear model b 1 (V(τ)) given by equation (2.3), the power-law model b 2 (V(τ)) given by equation (2.4) and the densitydependent model b 3 (V(τ)) given by equation (2.5) [19].Data fitting determined that the linear model b 1 (V(τ)) best fits the Netherlands 2016 data and the power-law model b 2 (V(τ)) best fits the Uganda 2012 royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231146 data (table 4 and electronic supplementary material, table S4).This means that while the per cent mosquito infection grows linearly with the virus load within an infected bird (on log 10 scale) for Netherlands 2016, it grows super-linearly ( power coefficient h = 3.67) with the virus load within an infected bird (on log 10 scale) for Uganda 2012.Biologically, this suggests that the per cent mosquito infections is proportional to the viral load inside the bird they feed on for Netherlands 2016 infections.By contrast, the per cent of mosquito infections is low when the viral load inside the bird they feed on is low and grows exponentially when the viral load inside the bird they feed on is high for Uganda 2012 infections.As with the within-host scale, we investigated the practical identifiability of the selected probability of infection functions and found that, for both viral strains, none of the parameters are strongly identifiable (table 5).This outcome is not surprising considering that only seven birds were used in this experiment and, consequently, only seven data points were used for data fitting.We investigated whether practical identifiability improves under abundant data by creating in silico experiments with increased number of measurements and fitting the probability of infection functions to them.We found that, unlike the within-host model, improving parameter identifiability under addition of data is not straightforward.Even an increase of in silico measurements to 60 (from the original seven), did not result in strong practical identifiability for the power-law model and Uganda 2012 data (table 6).
The ultimate goal of our study was to use the outcomes of the within-host scale and bird-to-mosquito transmission scale (obtained from validating them against data at these biological scales) to make predictions at the between-host scale (for which we have no a priori data) when Netherlands 2016 or Uganda 2012 strains are predominant in the wild.We developed a vector-borne age-structured withinbetween-host model (equation (2.10)) which assumed that the mosquito infectivity rate is dependent on the probability of vector infection given by function b 1 (V(τ)) (equation (2.3)) for Netherlands 2016 and function b 2 (V(τ)) (equation (2.4)) for Uganda 2012.We compared the USUV incidence predictions for these selected (lowest AIC) probability of infection functions and found similar peak disease incidence in birds regardless of the infectious strain and 4% higher peak disease incidence in mosquitoes when Netherlands 2016 (rather than Uganda 2012) was the infectious strain considered.
The transmission scale results predicted that the probability of bird-to-mosquito infection functions have fundamentally different characteristics for the two virus strains, with per cent mosquito infection growing linearly with the log 10 infectious viral load for Netherlands 2016, and super-linearly with the log 10 infectious viral load ( power coefficient h = 3.67) for Uganda 2012.Since we did not expect that the shape of the bird-to-mosquito infection function would vary by strain, we investigated whether knowledge of the long-term per cent mosquito infection can alter these results.Hence, we considered two in silico experimental design studies: one in which we assume known the viral load threshold needed for the entire mosquito population to get infected (hypothesis 1 in the electronic supplementary material) and one in which we assume that the per cent mosquito infection levels off at 60% (hypothesis 2 in the electronic supplementary material).Fitting to both these in silico datasets held the power-law transmission function as the best fit for both in silico viral strains (electronic supplementary material, tables S7 and S8).In future experimental studies, we aim to determine the relationship between per cent mosquito infection and viral load by designing artificial bloodmeal experiments, where mosquitoes are exposed to cotton balls containing blood of increased concentrations of Netherlands 2016 and Uganda 2012 virus, respectively.We have previously conducted such experiments (for a fixed inoculum) when establishing that C. quinquefasciatus mosquitoes are susceptible to USUV [11].
Our study has several limitations.First, the results of the within-host scale predict heterogeneity in virus dynamics between cohorts and within birds of a cohort (including large variability in predicted R 0 estimates among the birds infected with the Uganda 2012 strain).As explained above, some of this is due to uncertainty in model fitting, which can be improved under optimal experimental design where more data measurements (twice a day) and more birds are included.Another layer of uncertainty comes from the fact that the birds were inoculated, rather than infected by mosquitoes.To compensate for that, we considered a lower initial viral load (than the maximum inoculum) when determining parameter estimates in the within-host model.An even lower viral load may initiate the infection (in the wild), which may result in shorter incubation rates, and/or delayed peak viremia.Inoculation studies with varied inoculum dose would be needed to determine the role the inoculum plays in virus dynamics and disease outcomes.
Second, we assumed that all exposed mosquitoes get infected in the presence of high viral titres inside the host they feed on.That may not be realistic, and, adjustments to the probability of infection functions may be needed to account for a lower maximal probability of infection.Moreover, in the age-structured royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231146 epidemiological model we assumed that an infected mosquito immediately finds an array of hosts to infect.While that may be correct in the laboratory setting, it is not the case in the wild.Third, we assumed fixed mosquito-to-bird contact rates and constant mosquito-to-bird transmission probabilities (regardless of viral dynamics inside an infected mosquito).Relaxing these assumptions may change the current results.Fourth, the use of a limited number of birds and the use of only two bird species (wild-caught house sparrows and juvenile chickens) is another limitation that may prevent us from generalizing the results to other bird populations.Last, we assumed that all parameters are lognormally distributed during data fitting and all noise levels are normally distributed when conducting practical identifiability analyses.Previous studies have shown that the degree of uncertainty in parameter estimates vary for the different noise distributions and estimation methods [40], so our results are limited to our assumptions of distributions.
In conclusion, we investigated the dynamics of two strains of USUV at the within-host scale, bird-tovector transmission scale and vector-borne epidemiological scale.We used individual within-host infectious virus data and per cent mosquito infection data to predict USUV incidence in bird and mosquito populations.We addressed the role of model structure, data uncertainty and optimal experimental design on model predictions.We found that uncertainty in predictions at one scale may change predicted results at another scale.Optimal experimental design was proposed, suggesting that increased data frequency improves predictions.

Figure 3 .
Figure 3. Population median virus dynamics (solid line) given by model equation (2.1) versus infectious virus titres (circles) for birds (wild-caught house sparrows) infected with (a) Netherlands 2016 and (b) Uganda 2012.Model parameters are given in table1.Note that the empty circles account for censored data.

Figure 4 .
Figure 4. Distributions generated by simulations in Monolix (bars) versus theoretical lognormal distributions of model parameters β, δ, c, π and k from fitting model equation (2.1) to infectious virus titre data from birds (wild-caught house sparrows) infected with (a) Netherlands 2016 and (b) Uganda 2012.Mean parameter values μ and standard deviations σ are given in table 1.

Figure 5 .
Figure 5. (a) Probability of infection function b 1 (V(τ)) found by fitting function equation (2.3) (blue lines) to Netherlands 2016 per cent mosquito infection data (red dots, measured on log 10 scale).(b) Probability of infection function b 2 (V(τ)) found by fitting function equation (2.4) (blue lines) to Uganda 2012 per cent mosquito infection data (red dots, measured on log 10 scale).Parameters are given in table4.Green lines show 1000 fitted trajectories to synthetic data with 46% and 34% relative error.

Figure 6 .
Figure 6.USUV incidence in (a) bird populations and (b) mosquito populations as given by equation (2.10) and best parameter estimates from fitting b 1 (V(τ)) given by equation (2.3) to T n and T u data for Netherlands 2016 (solid lines) and from fitting b 2 (V(τ))given by equation (2.4) to T n and T u data for Uganda 2012 data (dashed lines).
numerically using parameter values p obtained from fitting the model to the given discrete experimental data and then record the predictions (output) at the experimental age of infection points.2.We generate virtual datasets by adding s ¼ f1%, 5%, 10%, 20%g measurement errors to each given experimental data point, with measurement errors assumed to be normally distributed with mean zero and standard deviation σ.For each measurement error we create M = 1000 such datasets.3.For each measurement error σ, we fit model equation(2.11)to each of the 1000 datasets to estimate new best parameter fits p i , with i = {1, 2, …, 1000}.4.We calculate the average relative estimation error (ARE) for each parameter of model equation(2.11)Weuse the ARE formula to determine whether each parameter of model equation(2.11) is practically identifiable using definition 2.2, given below.Definition 2.2.Let ARE be the average relative estimation error of the parameter p(k).The practical identifiability of parameter p (k) is determined by comparing ARE with measurement error.Let ARE be smaller than a constant multiple of the measurement error σ, that is AREðp ðkÞ Þ , hs: where p(k)is the kth element of the parameter set p and p ðkÞ i is kth element of p i . 5.

Table 1 .
Distributions for parameters β, δ, c, π and k found by fitting model equation(2.1)toinfectiousvirustitredatafrombirds(wild-caughthouse sparrows) infected with Netherlands 2016 and Uganda 2012.We used the SAEM algorithm in Monolix to generate the predicted population parameter means (ln(μ)) and s.d.(σ) (see Material and methods).We investigated whether model equation (2.1) is practically identifiable (see Material and methods) given the two datasets considered-temporal infectious virus titres from birds (wild-caught house sparrows) infected with USUV strains Netherlands 2016 and Uganda 2012.We found that parameters β and π are weakly practically identifiable for Netherlands 2016 (table2A), and not practically identifiable for Uganda 2012 (table2B).Parameters k and c are strongly practically identifiable for both strains (table 2), and δ is strongly practically identifiable for Netherlands 2016 (table2A), but weakly practically identifiable for Uganda 2012 (table To determine whether these results can be explained by limited data we investigated the practical identifiability of model equation(2.1)forNetherlands2016 and Uganda 2012 data.3.1.3.Practical identifiability of the within-host model

Table 6 .
Monte Carlo simulation results for data sampled at high-frequency.(A) AREs for parameter a obtained from fitting function b 1 (V(τ)) given by equation(2.3)toNetherlands 2016 per cent mosquito infection data when 60 data points are considered; (B) AREs for parameters a and h obtained from fitting function b 2 (V(τ)) given by equation (2.4) to Uganda 2012 per cent mosquito infection data when 60 data points are considered.Parameter a is measured in ml=log 10 PFU for equations (2.3) and (2.4).Parameter h is unitless.AREs give the average relative estimation error for each parameter at noise level σ, and η determines whether each parameter is strongly (0 < η ≤ 1), weakly (1 < η ≤ 10) or non-identifiable (η > 10).